Identifying Historical Period and Ethnic Origin of Documents Using Stylistic Feature Sets

نویسندگان

  • Yaakov HaCohen-Kerner
  • Hananya Beck
  • Elchai Yehudai
  • Dror Mughaz
چکیده

Text classification is an important and challenging research domain. In this paper, identifying historical period and ethnic origin of documents using stylistic feature sets is investigated. The application domain is Jewish Law articles written in Hebrew-Aramaic. Such documents present various interesting problems for stylistic classification. Firstly, these documents include words from both languages. Secondly, Hebrew and Aramaic are richer than English in their morphology forms. The classification is done using six different sets of stylistic features: quantitative features, orthographic features, topographic features, lexical features and vocabulary richness. Each set of features includes various baseline features, some of them formalized by us. SVM has been chosen as the applied machine learning method since it has been very successful in text classification. The quantitative set was found as very successful and superior to all other sets. Its features are domain-independent and languageindependent. It will be interesting to apply these feature sets in general and the quantitative set in particular into other domains as well as into other.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Elimination of the Elements of the Sentense in Sahife-ye-Shahi Book

Language always goes forward the brevity way, which means trying to convey its intentions by using the least number of words.The consequence of this process is contingencies such as deletion of sentence components. Poets and writers sometimes omitted some of the components of the word in order to summarize the word and, of course, to observe the principles of rhetoric, punctilios and syntactic ...

متن کامل

Quantitative patterns of stylistic influence in the evolution of literature.

Literature is a form of expression whose temporal structure, both in content and style, provides a historical record of the evolution of culture. In this work we take on a quantitative analysis of literary style and conduct the first large-scale temporal stylometric study of literature by using the vast holdings in the Project Gutenberg Digital Library corpus. We find temporal stylistic localiz...

متن کامل

O-1: Evaluation of Ethnic Patterns of Y Chromosome Microdeletions in Iranian Infertile Men with Azoospermia/Severe Oligospermia Referred to Royan Institute

Background: Microdeletions of the long arm of the chromosome Y are the most common molecular genetic cause of severe infertility in men which affect three regions of AZFa, AZFb and AZFc (Azoospermia factor). These regions contain various genes involved in spermatogenesis. The effect of ethnicity on the patterns of Y chromosome microdeletions has not been extensively studied, particulary in Iran...

متن کامل

"Étude comparative de trois ensembles de descripteurs de texture pour la segmentation de documents anciens"

Recently, texture-based features have been used for digitized historical document image segmentation. It has been proven that these methods work effectively with no a priori knowledge. Moreover, it has been shown that they are robust when they are applied on degraded documents under different noise levels and kinds. In this paper an approach of evaluating CIFED 2014, pp. 41–56, Nancy, 18-21 mar...

متن کامل

Historical Memory, Ethnic Identity and Globalization: an Intergenerational Study in Sanandaj

Identity and collective memory is one of the fascinating subjects in modern sociology. In Iran, it arose again as modern and government-nation discourse emerged, and has been endlessly discussed since then. Such subjects have assumed an increasing importance due to the multi-cultural and multi-ethical existence of Iranian society and recent global developments. Accordingly, the present study ha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006